Together We Can: Bilingual Bootstrapping for WSD
نویسندگان
چکیده
Recent work on bilingual Word Sense Disambiguation (WSD) has shown that a resource deprived language (L1) can benefit from the annotation work done in a resource rich language (L2) via parameter projection. However, this method assumes the presence of sufficient annotated data in one resource rich language which may not always be possible. Instead, we focus on the situation where there are two resource deprived languages, both having a very small amount of seed annotated data and a large amount of untagged data. We then use bilingual bootstrapping, wherein, a model trained using the seed annotated data of L1 is used to annotate the untagged data of L2 and vice versa using parameter projection. The untagged instances of L1 and L2 which get annotated with high confidence are then added to the seed data of the respective languages and the above process is repeated. Our experiments show that such a bilingual bootstrapping algorithm when evaluated on two different domains with small seed sizes using Hindi (L1) and Marathi (L2) as the language pair performs better than monolingual bootstrapping and significantly reduces annotation cost.
منابع مشابه
Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning
Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation methods. In this paper we investigate a label propagation based semisupervised learning algorithm for WSD, which combines labeled and unlabeled data in learning process to fully realize a global consistency assumption: similar examples should have similar labels. Our experimental results on benchmark c...
متن کاملBootstrapping Without the Boot
What: We like minimally supervised learning (bootstrapping). Let’s convert it to unsupervised learning (“strapping”). How: If the supervision is so minimal, let’s just guess it! Lots of guesses lots of classifiers. Try to predict which one looks plausible (!?!). We can learn to make such predictions. Results (on WSD): Performance actually goes up! (Unsupervised WSD for translational senses, Eng...
متن کاملWord Sense Disambiguation Using Sense Examples Automatically Acquired from a Second Language
We present a novel almost-unsupervised approach to the task of Word Sense Disambiguation (WSD). We build sense examples automatically, using large quantities of Chinese text, and English-Chinese and Chinese-English bilingual dictionaries, taking advantage of the observation that mappings between words and meanings are often different in typologically distant languages. We train a classifier on ...
متن کاملCombining Machine Readable Lexical Resources and Bilingual Corpora for Broad Word Sense Disambiguation
This paper describes a new approach to word sense disambiguation (WSD) based on automatically acquired "word sense division. The semantically related sense entries in a bilingual dictionary are arranged in clusters using a heuristic labeling algorithm to provide a more complete and appropriate sense division for WSD. Multiple translations of senses serve as outside information for automatic tag...
متن کاملREINA at CLEF 2009 Robust-WSD Task: Partial Use of WSD Information for Retrieval
This paper describes the participation of the REINA research group at CLEF 2009 Robust-WSD Task. We have participated in both monolingual and bilingual subtasks. In past editions of the robust task our research group obtained very good results for non-WSD experiments applying local query expansion using co-occurrence based thesauri constructed using windows of terms. We applied it again. For WS...
متن کامل